x86: use VMLOAD for PV context switch
Having noticed that VMLOAD alone is about as fast as a single of the
involved WRMSRs, I thought it might be a reasonable idea to also use it
for PV. Measurements, however, have shown that an actual improvement can
be achieved only with an early prefetch of the VMCB (thanks to Andrew
for suggesting to try this), which I have to admit I can't really
explain. This way on my Fam15 box context switch takes over 100 clocks
less on average (the measured values are heavily varying in all cases,
though).
This is intentionally not using a new hvm_funcs hook: For one, this is
all about PV, and something similar can hardly be done for VMX.
Furthermore the indirect to direct call patching that is meant to be
applied to most hvm_funcs hooks would be ugly to make work with
functions having more than 6 parameters.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Brian Woods <brian.woods@amd.com>
Acked-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>